Multistage Temporal Difference Learning for 2048-Like Games

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Temporal Difference Learning for Nondeterministic Board Games

We use temporal difference (TD) learning to train neural networks for four nondeterministic board games: backgammon, hypergammon, pachisi, and Parcheesi. We investigate the influence of two variables on the development of these networks: first, the source of training data, either learner-vs.self or learner-vs.-other game play; second, the choice of attributes used: a simple encoding of the boar...

متن کامل

Learning to Play Board Games using Temporal Difference Methods

A promising approach to learn to play board games is to use reinforcement learning algorithms that can learn a game position evaluation function. In this paper we examine and compare three different methods for generating training games: (1) Learning by self-play, (2) Learning by playing against an expert program, and (3) Learning from viewing experts play against themselves. Although the third...

متن کامل

Deep Reinforcement Learning for 2048

In this paper, we explore the performance of a Reinforcement Learning algorithm using a Policy Neural Network to play the popular game 2048. After proposing a modelization of the state and action spaces, we review our learning process, and train a first model without incorporating any prior knwoledge of the game. We prove that a simple Probabilistic Policy Network achieves a 4 times higher maxi...

متن کامل

Dual Temporal Difference Learning

Recently, researchers have investigated novel dual representations as a basis for dynamic programming and reinforcement learning algorithms. Although the convergence properties of classical dynamic programming algorithms have been established for dual representations, temporal difference learning algorithms have not yet been analyzed. In this paper, we study the convergence properties of tempor...

متن کامل

Preconditioned Temporal Difference Learning

LSTD is numerically instable for some ergodic Markov chains with preferred visits among some states over the remaining ones. Because the matrix that LSTD accumulates has large condition numbers. In this paper, we propose a variant of temporal difference learning with high data efficiency. A class of preconditioned temporal difference learning algorithms are also proposed to speed up the new met...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Computational Intelligence and AI in Games

سال: 2017

ISSN: 1943-068X,1943-0698

DOI: 10.1109/tciaig.2016.2593710